A hybrid format for better performance of sparse matrix-vector multiplication on a GPU
نویسندگان
چکیده
In this paper, we present a new sparse matrix data format that leads to improved memory coalescing and more efficient sparse matrix-vector multiplication (SpMV) for a wide range of problems on high throughput architectures such as a graphics processing unit (GPU). The sparse matrix structure is constructed by sorting the rows based on the row length (defined as the number of non-zero elements in a matrix row) followed by a partition into two ranges: short rows and long rows. Based on this partition, the matrix entries are then transformed into ELLPACK (ELL) or vectorized compressed sparse row (vCSR) format. In addition, the number of threads are adaptively selected based on the row length in order to balance the workload for each GPU thread. Several computational experiments are presented to support this approach and the results suggest a notable improvement over a wide range of matrix structures. Keywords— SpMV, GPU, EVC-HYB format, adaptive
منابع مشابه
Sparse Matrix-vector Multiplication on Nvidia Gpu
In this paper, we present our work on developing a new matrix format and a new sparse matrix-vector multiplication algorithm. The matrix format is HEC, which is a hybrid format. This matrix format is efficient for sparse matrix-vector multiplication and is friendly to preconditioner. Numerical experiments show that our sparse matrix-vector multiplication algorithm is efficient on
متن کاملEffective Sparse Matrix Representation for the GPU Architectures
General purpose computation on graphics processing unit (GPU) is prominent in the high performance computing era of this time. Porting or accelerating the data parallel applications onto GPU gives the default performance improvement because of the increased computational units. Better performances can be seen if application specific fine tuning is done with respect to the architecture under con...
متن کاملGPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
Many high performance computing applications require computing both sparse matrix-vector product (SMVP) and sparse matrix-transpose vector product (SMTVP) for better overall performance. Under such a circumstance, it is critical to maintain a similarly high throughput for these two computing patterns with the underlying sparse matrix encoded in a single storage format. The compressed sparse blo...
متن کاملImplementing Sparse Matrix-Vector Multiplication with QCSR on GPU
We are going through the computation from single core to multicore architecture in parallel programming. Graphics Processor Units (GPUs) have recently emerged as outstanding platforms for data parallel applications with regular data access patterns. However, it is still challenging to optimize computations with irregular data access patterns like sparse matrix-vector multiplication (SPMV). SPMV...
متن کاملOptimizing Sparse Matrix-Matrix Multiplication on a Heterogeneous CPU-GPU Platform
Sparse Matrix-Matrix multiplication (SpMM) is a fundamental operation over irregular data, which is widely used in graph algorithms, such as finding minimum spanning trees and shortest paths. In this work, we present a hybrid CPU and GPU-based parallel SpMM algorithm to improve the performance of SpMM. First, we improve data locality by element-wise multiplication. Second, we utilize the ordere...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJHPCA
دوره 30 شماره
صفحات -
تاریخ انتشار 2016